Engineering a Failure Detection Service for Widely Distributed Systems

نویسندگان

  • Bruno G. Catão
  • Ana Cristina A. Oliveira
چکیده

Unreliable failure detectors are recognized as important building blocks for implementing fault-tolerant distributed systems. Further, there has been a lot of discussion on how to provide them with sophisticated features that allow for adaptation, flexible use, scalability and quality of service enforcement. Despite that, we are not aware of any real distributed system that uses a sophisticated failure detection service. In fact, most systems deployed use the trivial failure detection scheme provided by the underlying communication technologies (e.g. TCP/IP timeouts). We believe that this state of affairs is due to two main reasons: i) there is no widely supported failure detection service API that incorporates these advanced features in a suitable way; and ii) the benefits of using a sophisticated failure detection service are not clearly understood. This paper targets the first issue by proposing a failure detection service that addresses the main necessities of widely distributed systems and implements the state-of-the-art in failure detection mechanisms. Moreover, to improve the usability of the service we took special care in the design of its programming interface.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Network Based Protection of Software Defined Network Controller against Distributed Denial of Service Attacks

Software Defined Network (SDN) is a new architecture for network management and its main concept is centralizing network management in the network control level that has an overview of the network and determines the forwarding rules for switches and routers (the data level). Although this centralized control is the main advantage of SDN, it is also a single point of failure. If this main contro...

متن کامل

A Novel Passive Method for Islanding Detection in Microgrids

Integration of distributed generations (DGs) in power grids is expected to play an essential role in the infrastructure and market of electrical power systems. Microgrids are small energy systems, capable of balancing captive supply and requesting resources to retain stable service within a specific boundary. Microgrids can operate in grid-connected or islanding modes. Effective islanding detec...

متن کامل

On the Design of a Failure Detection Service for Large-Scale Distributed Systems

It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. There are however several issues that must be addressed before such a service can actually be implemented. In this paper, we highlight the main issues related to ensuring failure detection in large-scale systems, and overview the main solutions proposed in the lit...

متن کامل

Cold standby redundancy optimization for nonrepairable series-parallel systems: Erlang time to failure distribution

In modeling a cold standby redundancy allocation problem (RAP) with imperfect switching mechanism, deriving a closed form version of a system reliability is too difficult. A convenient lower bound on system reliability is proposed and this approximation is widely used as a part of objective function for a system reliability maximization problem in the literature. Considering this lower bound do...

متن کامل

Radial Basis Neural Network Based Islanding Detection in Distributed Generation

This article presents a Radial Basis Neural Network (RBNN) based islanding detection technique. Islanding detection and prevention is a mandatory requirement for grid-connected distributed generation (DG) systems. Several methods based on passive and active detection scheme have been proposed. While passive schemes have a large non detection zone (NDZ), concern has been raised on active method ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005